Text Embedding Bank for Detailed Image Paragraph Captioning
نویسندگان
چکیده
Existing deep learning-based models for image captioning typically consist of an encoder to extract visual features and a language model decoder, architecture that has shown promising results in single high-level sentence generation. However, only the word-level guiding signal is available when optimized features. The inconsistency between parallel extraction sequential text supervision limits its success length generated long (more than 50 words). We propose new module, called Text Embedding Bank (TEB), address this problem paragraph captioning. This module uses vector learn fixed-length feature representations from variable-length paragraph. refer as TEB. TEB plays two roles benefit performance. First, it acts form global coherent regularize encoder. Second, distributed memory provide whole model, which alleviates long-term dependency problem. Adding existing state-of-the-art methods achieves result on Stanford Visual Genome dataset.
منابع مشابه
Text-Guided Attention Model for Image Captioning
Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns t...
متن کاملDocument Embedding with Paragraph Vectors
Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts. In their work, the authors showed that the method can learn an embedding of movie review texts which can be leveraged for sentiment analysis. That proof of concept, while encouraging, was rather narrow. Here we consider tasks other than sentiment analysis, provide...
متن کاملConditional Image-Text Embedding Networks
This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies th...
متن کاملImproving Automatic Image Captioning Using Text Summarization Techniques
This paper presents two different approaches to automatic captioning of geo-tagged images by summarizing multiple web-documents that contain information related to an image’s location: a graph-based and a statistical-based approach. The graph-based method uses text cohesion techniques to identify information relevant to a location. The statistical-based technique relies on different word or nou...
متن کاملSpecialising Paragraph Vectors for Text Polarity Detection
This paper presents some experiments for specialising Paragraph Vectors, a new technique for creating text fragment (phrase, sentence, paragraph, text, ...) embedding vectors, for text polarity detection. The first extension regards the injection of polarity information extracted from a polarity lexicon into embeddings and the second extension aimed at inserting word order information into Para...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i18.17892